Search CORE

8 research outputs found

Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states

Author: COSTA Anna Helena Reali
SILVA Valdinei Freire da
Publication venue: Sociedade Brasileira de Computação
Publication date
Field of study

Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQ-Learning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)GRICESFAPESPCNP

Markov decision processes for ad network optimization

Author: Costa Anna Helena Reali
Cozman Fabio Gagliardi
Silva Valdinei Freire da
Truzzi Flávio Sales
Publication venue: Paraná
Publication date
Field of study

In this paper we examine a central problem in a particular advertising\ud scheme: we are concerned with matching marketing campaigns that produce\ud advertisements (“ads”), to impressions — where “impression” is a general term\ud for any space in the internet that can display an ad. In this paper we propose a\ud new take on the problem by resorting to planning techniques based on Markov\ud Decision Processes, and by resorting to plan generation techniques that have\ud been developed in the AI literature. We present a detailed formulation of the\ud Markov Decision Process approach and results of simulated experimentsAnna Helena Reali Costa and F ́ abio Gagliardi Cozman are partially supported by CNPq. Fl ́ avio Sales Truzzi is supported by CAPES. The work reported here has received sub- stantial support through FAPESP grant 2008/03995-5 and FAPESP grant 2011/19280-

Speeding-up reinforcement learning through abstraction and transfer learning

Author: Costa Anna Helena Reali
Cozman Fabio Gagliardi
Koga Marcelo Li
Silva Valdinei Freire da
Publication venue: Saint Paul, Minnesota
Publication date
Field of study

We are interested in the following general question: is it pos-\ud sible to abstract knowledge that is generated while learning\ud the solution of a problem, so that this abstraction can ac-\ud celerate the learning process? Moreover, is it possible to\ud transfer and reuse the acquired abstract knowledge to ac-\ud celerate the learning process for future similar tasks? We\ud propose a framework for conducting simultaneously two lev-\ud els of reinforcement learning, where an abstract policy is\ud learned while learning of a concrete policy for the problem,\ud such that both policies are refined through exploration and\ud interaction of the agent with the environment. We explore\ud abstraction both to accelerate the learning process for an op-\ud timal concrete policy for the current problem, and to allow\ud the application of the generated abstract policy in learning\ud solutions for new problems. We report experiments in a\ud robot navigation environment that show our framework to\ud be effective in speeding up policy construction for practical\ud problems and in generating abstractions that can be used to\ud accelerate learning in new similar problems.This research was partially supported by FAPESP (2011/ 19280-8, 2012/02190-9, 2012/19627-0) and CNPq (311058/ 2011-6, 305395/2010-6

AVALIAÇÃO DE POLÍTICAS ABSTRATAS NA TRANSFERÊNCIA DE CONHECIMENTO EM NAVEGAÇÃO ROBÓTICA

Author: Beirig Rafael Lemes
Koga Marcelo Li
Matos Tiago
Pereira Fernando Andrade
Reali Costa Anna Helena
Silva Valdinei Freire da
Publication venue: 'Revista de Sistemas e Computacao - RSC'
Publication date: 14/01/2013
Field of study

This paper presents a new approach to the problem of solving a new task by the use of previous knowledge acquired during the process of solving a similar task in the same domain, robot navigation. A new algorithm, Qab-Learning, is proposed to obtain the abstract policy that will guide the agent in the task of reaching a goal location from any other location in the environment, and this policy is compared to the policy derived from another algorithm, ND-TILDE. The policies are applied in a number of different tasks in two environments. The results show that the policies, even after the process of abstraction, present a positive impact on the performance of the agent. COMPARISON AND EVALUATION OF ABSTRACT POLICIES FOR TRANSFER LEARNING IN ROBOT NAVIGATION TASKSEste paper apresenta uma análise comparativa do desempenho de políticas abstratas na reutilização de conhecimento no domínio de navegação robótica. O algoritmo Qab-Learning é proposto para a obtenção dessas políticas, que são então comparadas a poíticas abstratas geradas por um algoritmo presente na literatura, o ND-TILDE, na solução de diferentes tarefas de navegação robótica. O presente trabalho demonstra que essas políticas, mesmo após o processo de abstração, apresentam um impacto positivo no desempenho do agente

Universidade Salvador: Portal de Periódicos UNIFACS

Carbon isotope fractionation for cotton genotype selection

Author: BELOT J.L.
BLUM A.
BOWMAN M.J.
BRITO G.G. de
Camilo de Lelis Morello
Carlos Ducatti
CENTRITTO M.
CHENU K.
COOPER M.
CUADRA S.V.
ELAZAB A.
Evandro Tadeu da Silva
FARQUHAR G.D.
FREIRE E.C.
Giovani Greigh de Brito
KLOTH R.H.
LOPES M.S.
LU Z.M.
MARIN F.R.
Nelson Dias Suassuna
PASK A.J.D.
RADIN J.W.
REBETZKE G.J.
SCOTT A.J.
STEWART J.M.
TRENBERTH K.E.
TUBEROSA R.
ULLAH I.
Valdinei Sofiatti
Valdir Diola
ZHANG M.
ZHANG T.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Preference elicitation using evaluation over observed behaviours.

Author: Silva Valdinei Freire da
Publication venue: 'Universidade de Sao Paulo, Agencia USP de Gestao da Informacao Academica (AGUIA)'
Publication date: 07/04/2009
Field of study

Recentemente, várias tarefas tem sido delegadas a sistemas computacionais, principalmente quando sistemas computacionais são mais confiáveis ou quando as tarefas não são adequadas para seres humanos. O uso de extração de preferências ajuda a realizar a delegação, permitindo que mesmo pessoas leigas possam programar facilmente um sistema computacional com suas preferências. As preferências de uma pessoa são obtidas por meio de respostas para questões específicas, que são formuladas pelo próprio sistema computacional. A pessoa age como um usuário do sistema computacional, enquanto este é visto como um agente que age no lugar da pessoa. A estrutura e contexto das questões são apontadas como fonte de variações das respostas do usuário, e tais variações podem impossibilitar a factibilidade da extração de preferências. Uma forma de evitar tais variações é questionar um usuário sobre a sua preferência entre dois comportamentos observados por ele. A questão de avaliar relativamente comportamentos observados é mais simples e transparente ao usuário, diminuindo as possíveis variações, mas pode não ser fácil para o agente interpretar tais avaliações. Se existem divergências entre as percepções do agente e do usuário, o agente pode ficar impossibilitado de aprender as preferências do usuário. As avaliações são geradas com base nas percepções do usuário, mas tudo que um agente pode fazer é relacionar tais avaliações às suas próprias percepções. Um outro problema é que questões, que são expostas ao usuário por meio de comportamentos demonstrados, são agora restritas pela dinâmica do ambiente e um comportamento não pode ser escolhido arbitrariamente. O comportamento deve ser factível e uma política de ação deve ser executada no ambiente para que um comportamento seja demonstrado. Enquanto o primeiro problema influencia a inferência de como o usuário avalia comportamentos, o segundo problema influencia quão rápido e acurado o processo de aprendizado pode ser feito. Esta tese propõe o problema de Extração de Preferências com base em Comportamentos Observados utilizando o arcabouço de Processos Markovianos de Decisão, desenvolvendo propriedades teóricas em tal arcabouço que viabilizam computacionalmente tal problema. O problema de diferentes percepções é analisado e soluções restritas são desenvolvidas. O problema de demonstração de comportamentos é analisado utilizando formulação de questões com base em políticas estacionárias e replanejamento de políticas, sendo implementados algoritmos com ambas soluções para resolver a extração de preferências em um cenário sob condições restritas.Recently, computer systems have been delegated to accomplish a variety of tasks, when the computer system can be more reliable or when the task is not suitable or not recommended for a human being. The use of preference elicitation in computational systems helps to improve such delegation, enabling lay people to program easily a computer system with their own preference. The preference of a person is elicited through his answers to specific questions, that the computer system formulates by itself. The person acts as an user of the computer system, whereas the computer system can be seen as an agent that acts in place of the person. The structure and context of the questions have been pointed as sources of variance regarding the users answers, and such variance can jeopardize the feasibility of preference elicitation. An attempt to avoid such variance is asking an user to choose between two behaviours that were observed by himself. Evaluating relatively observed behaviours turn questions more transparent and simpler for the user, decreasing the variance effect, but it might not be easier interpreting such evaluations. If divergences between agents and users perceptions occur, the agent may not be able to learn the users preference. Evaluations are generated regarding users perception, but all an agent can do is to relate such evaluation to his own perception. Another issue is that questions, which are exposed to the user through behaviours, are now constrained by the environment dynamics and a behaviour cannot be chosen arbitrarily, but the behaviour must be feasible and a policy must be executed in order to achieve a behaviour. Whereas the first issue influences the inference regarding users evaluation, the second problem influences how fast and accurate the learning process can be made. This thesis proposes the problem of Preference Elicitation under Evaluations over Observed Behaviours using the Markov Decision Process framework and theoretic properties in such framework are developed in order to turn such problem computationally feasible. The problem o different perceptions is analysed and constraint solutions are developed. The problem of demonstrating a behaviour is considered under the formulation of question based on stationary policies and non-stationary policies. Both type of questions was implemented and tested to solve the preference elicitation in a scenario with constraint conditions

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de Teses e Dissertações

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Políticas Educacionais e Pesquisas Acadêmicas sobre Dança na Escola no Brasil: um movimento em rede

Author: ALMEIDA Fernanda de Souza
ALMEIDA Fernanda de Souza
ANDRADE Carolina Romano de
ANJOS Francisco Valdinei dos Santos
AQUINO Rita Ferreira de
AQUINO Rita Ferreira de
AQUINO Rita Ferreira de
ASSIS Thiago Santos de
ASSUMPÇÃO Andréa Cristhina Rufino
BARRETO Débora
BARRETO Débora
BATALHA Cecília Silvano
BORGES Luciane Sarmento Pugliese
CARNEIRO Natalia Martins
CASTRO Juliana Fernandez
CAZÉ Ana Flávia Jesus Oliveira
CAZÉ Clotildes Maria de Jesus Oliveira
CHAVES Virgínia Maria Rocha
CORRÊA Josiane Franken
CORRÊA Josiane Franken
COSTA Suzana França da
COSTA Suzana França da
CRAVO Claudia de Souza Rosa
CRUZ Elaine Izabel da Silva
CUNHA Vivian Cafaro
CURVELO Marília Nascimento
FALKEMBACH Maria Fonseca
FALKEMBACH Maria Fonseca
FERNANDES Kelly Bomfim da Silva
FERRARI Marina Gonçalves Barbieri
FERREIRA Graziela Silva
FIAMONCINI Luciana
FIAMONCINI Luciana
FIGUEIREDO Valéria Maria Chaves de
FRANCISCHI Vanessa Gertrudes
FREIRE Ida Mara
FREITAS Aline da Silva
FUX María
GERKEN Maria Aparecida de Souza
GODOY Kathya Maria Ayres de
GONÇALVES Camila Correia Santos
LABAN Rudolf
LABAN Rudolf
LIMA Elaine Pereira
LUZ Roberta Jorge
MACEDO Elizabeth
MARQUES Isabel
MARQUES Isabel
MARQUES Isabel
MARQUES Isabel
MARQUES Isabel A
MARQUES Isabel A
MARQUES Isabel A
MARQUES Isabel A
MARQUES Isabel A
MARQUES Isabel A
MARQUES Isabel A
MARTINS Adriana dos Reis
MARTINS Ricardo Marinelli
MAÇANEIRO Scheila Mara
MELLO Guiomar Namo de
MELO Adriana Almeida Salles de
MORANDI Carla Silvia Dias de Freitas
MÖDINGER Carlos
MÖDINGER Carlos
NANNI Dionísia
NANNI Dionísia
OLIVEIRA Maurício Jesus
OSSONA Paulina
PENNA Maura
PEREIRA Marcelo de Andrade
PEREIRA Sybelle
PICCININI Larise
RENGEL Lenira
RENGEL Lenira
ROCHA Thereza
SANTOS Bruna Bardini dos
SANTOS Cristiane Gomes dos
SARAIVA Maria do Carmo
SARAIVA Maria do Carmo
SCARPATO Marta Thiago
SCIALOM Melina
SILVA Angela Ferreira da
SILVA Edna Christine
SILVA Eliana Rodrigues
SILVA Roberto da
SILVEIRA Silvia Camara Soter da
SOUSA Sílvia Azevêdo
SOUZA Andréa Bittencourt de
SOUZA João Batista de Lima
STOKOE Patricia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
STRAZZACAPPA Márcia
SUBTIL Maria José Doza
TADRA Débora
TOMAZZONI Airton
VARGAS Lisete Arnizaut Machado
VERDERI Erica
VERDERI Erica
VIEIRA Alba Pedreira
VIEIRA Alba Pedreira
VIEIRA Marcilio de Souza
VILAS BOAS Priscilla
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2019
Field of study

Crossref

Transfer Learning for Multiagent Reinforcement Learning Systems

Author: Abadi Martín
Abel David
Agarwal Akshat
Albrecht Stefano V.
Alexander
Amir Ofra
Argall Brenna D.
Argente Estefania
Badue Claudine
Banerjee Bikramjit
Banerjee Bikramjit
Barrett Samuel
Barto Andrew G.
Bazzan Ana L. C.
Behboudian Paniz
Bengio Yoshua
Berner Christopher
Bianchi Reinaldo
Bianchi Reinaldo A. C.
Bianchi Reinaldo A. C.
Bignold Adam
Bogg Paul
Boutsioukis Georgios
Bradley Knox W.
Braylan Alexander
Brys Tim
Brys Tim
Busoniu Lucian
Capobianco Roberto
Castaneda Alvaro Ovalle
Cederborg Thomas
Chernova Sonia
Chernova Sonia
Chernova Sonia
Cobo Luis C.
Croonenborghs Tom
Cui Yuchen
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Danilo
de Cote Enrique Munoz
De Hauwere Y-M.
de la Cruz Gabriel V.
Devailly François-Xavier
Devin Coline
Devlin Sam
Devlin Sam
Didi Sabre
Dietterich Thomas G.
Diuk Carlos
Du Yunshu
Dusparic Ivana
Fang Zhou
Fernández Fernando
Fitzgerald Tesca
Florensa Carlos
Floyd Michael W.
Foerster Jakob
Foerster Jakob N.
Foerster Jakob N.
Freire Valdinei
Glatt Ruben
Goldberg David E.
Goodfellow Ian J.
Griffith Shane
Gupta Abhishek
Gupta Jayesh K.
Hanna Josiah
Hausknecht Matthew
Hausknecht Matthew
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hou Yaqing
Hu Junling
Hu Yujing
Hu Yujing
Hu Yujing
Ilhan Ercüment
Isele David
Jordan Scott M.
Judah Kshitij
Judah Kshitij
Judah Kshitij
Kelly Stephen
Kersting Kristian
Kim Dong-Ki
Kitano Hiroaki
Kober Jens
Koga M. L.
Koga Marcelo Li
Kolter J. Zico
Konidaris George
Kono Hitoshi
Krening Samantha
Lai Kwei-Herng
Lauer Martin
Le Hoang Minh
Leibo Joel Z.
Li Lihong
Liang Eric
Lin Xiaomin
Littman Michael L.
Littman Michael L.
Lopes Manuel
Lowe Ryan
Lyu Xueguang
MacGlashan James
MacGlashan James
Maclin Richard
Madden Michael G.
Mandel Travis
Martin
Matiisen Tambet
Matthew
MDP
Melo Francisco S.
Mnih Volodymyr
Narvekar Sanmit
Narvekar Sanmit
Narvekar Sanmit
Natarajan Sriraam
Ng Andrew Y.
Nguyen Thanh Thi
Omidshafiei Shayegan
Omidshafiei Shayegan
Pan Sinno J.
Panait Liviu
Paszke Adam
Peng Bei
Peng Bei
Pinto Lerrel
Poole David L.
Price Bob
Price Bob
Proper Scott
Puterman Martin L.
Ramachandran Deepak
Ramakrishnan Ramya
Reddy Tummalapalli Sudhamsh
Rosenfeld Ariel
Ryu Heechang
Sakato Tatsuya
Schaal Stefan
Schulman John
Schulman John
Shiarlis Kyriacos
Shoham Yoav
Shon Aaron P.
Shortreed Susan M.
Silver David
Sinapov Jivko
Sodomka Eric
Souza Lucas Oliveira
Stanley Kenneth O.
Stone Peter
Stone Peter
Stone Peter
Suay Halit Bener
Subramanian Kaushik
Subramanian Sriram Ganapathi
Sukhbaatar Sainbayar
Sukhbaatar Sainbayar
Sutton Richard S.
Svetlik Maxwell
Tamassia Marco
Tan Ming
Tangkaratt Voot
Tangkaratt Voot
Tanner Brian
Taylor Adam
Taylor Adam
Taylor Matthew E.
Taylor Matthew E.
Taylor Matthew E.
Tesauro Gerald
Thrun Sebastian
Todorov Emanuel
Torabi Faraz
Torabi Faraz
Torrey Lisa
Vamplew Peter
Vinyals Oriol
Vrancx Peter
Walsh Thomas J.
Wang Zhaodong
Watkins Christopher J.
Wiewiora Eric
Wooldridge Michael J.
Xiong Yanhai
Yang Tianpei
Yang Yaodong
Zhan Yusen
Zhifei Shao
Zhou L.
Zhou Ming
Zhu Changxi
Zimmer Matthieu
Publication venue: 'Morgan & Claypool Publishers LLC'
Publication date
Field of study

Crossref